VIRTUALINA Reinforcement%20Learningの記事

SARSA

SARSA (State-Action-Reward-State-Action) is an on-policy reinforcement learning algorithm that learns by updating its value estimates based on the actions it actually takes. The name comes from the sequence of information it uses: it observes the current state (S), takes an action (A), receives a reward (R), moves to a new state (S), and then selects the next action (A) before updating its knowledge. Unlike Q-learning which always assumes optimal future actions, SARSA updates its estimates based on the action it will actually take next, including any exploratory random actions.

Sun Sep 21 2025

Successor Representation

Successor Representation (SR) is a reinforcement learning framework that decomposes value functions into two separate components, a representation of future state occupancy and immediate rewards. Instead of directly learning the value of being in a state, SR learns the expected discounted future visitation frequencies—essentially asking "if I start in state s and follow my policy, how much time will I spend in each other state?" This representation, combined with separate reward predictions, creates a middle ground between model-free methods (like Q-Learning) and model-based methods, enabling faster adaptation when rewards change but the environment dynamics remain constant.

Sun Sep 21 2025

Tag: Reinforcement Learning